Data description

Number of subjects per survey year

with(NDNS, tab1(SurveyYear, graph = FALSE, decimal = 2))
## SurveyYear : 
##             Frequency Percent Cum. percent
## NDNS Year 1       801   13.01        13.01
## NDNS Year 2       812   13.19        26.21
## NDNS Year 3       782   12.71        38.91
## NDNS Year 4      1055   17.14        56.05
## NDNS Year 5       625   10.15        66.21
## NDNS Year 6       663   10.77        76.98
## NDNS Year 7       703   11.42        88.40
## NDNS Year 8       714   11.60       100.00
##   Total          6155  100.00       100.00

Number of subjects per servey year by gender where Men == 1, Women == 2

with(NDNS, tabpct(SurveyYear, Sex,  graph = FALSE, decimal = 2))
## 
## Original table 
##              Sex
## SurveyYear        1     2  Total
##   NDNS Year 1   336   465    801
##   NDNS Year 2   350   462    812
##   NDNS Year 3   339   443    782
##   NDNS Year 4   418   637   1055
##   NDNS Year 5   249   376    625
##   NDNS Year 6   254   409    663
##   NDNS Year 7   321   382    703
##   NDNS Year 8   270   444    714
##   Total        2537  3618   6155
## 
## Row percent 
##              Sex
## SurveyYear          1       2  Total
##   NDNS Year 1     336     465    801
##                (41.9)  (58.1)  (100)
##   NDNS Year 2     350     462    812
##                (43.1)  (56.9)  (100)
##   NDNS Year 3     339     443    782
##                (43.4)  (56.6)  (100)
##   NDNS Year 4     418     637   1055
##                (39.6)  (60.4)  (100)
##   NDNS Year 5     249     376    625
##                (39.8)  (60.2)  (100)
##   NDNS Year 6     254     409    663
##                (38.3)  (61.7)  (100)
##   NDNS Year 7     321     382    703
##                (45.7)  (54.3)  (100)
##   NDNS Year 8     270     444    714
##                (37.8)  (62.2)  (100)
## 
## Column percent 
##              Sex
## SurveyYear        1        %     2        %
##   NDNS Year 1   336  (13.24)   465  (12.85)
##   NDNS Year 2   350  (13.80)   462  (12.77)
##   NDNS Year 3   339  (13.36)   443  (12.24)
##   NDNS Year 4   418  (16.48)   637  (17.61)
##   NDNS Year 5   249   (9.81)   376  (10.39)
##   NDNS Year 6   254  (10.01)   409  (11.30)
##   NDNS Year 7   321  (12.65)   382  (10.56)
##   NDNS Year 8   270  (10.64)   444  (12.27)
##   Total        2537    (100)  3618    (100)

Summary of their age

NDNS %>% 
  group_by(SurveyYear, Sex) %>% 
  summarise(N = n(), MeanAge = mean(Age), SDAge = sd(Age), minAge = min(Age), maxAge = max(Age))
## # A tibble: 16 x 7
## # Groups:   SurveyYear [?]
##    SurveyYear  Sex       N MeanAge SDAge minAge maxAge
##    <chr>       <chr> <int>   <dbl> <dbl>  <dbl>  <dbl>
##  1 NDNS Year 1 1       336    49.9  17.3     19     86
##  2 NDNS Year 1 2       465    49.2  17.8     19     94
##  3 NDNS Year 2 1       350    48.9  17.3     19     96
##  4 NDNS Year 2 2       462    50.2  17.9     19     92
##  5 NDNS Year 3 1       339    48.5  16.9     19     87
##  6 NDNS Year 3 2       443    49.6  18.0     19     93
##  7 NDNS Year 4 1       418    51.4  17.0     19     90
##  8 NDNS Year 4 2       637    48.8  17.0     19     94
##  9 NDNS Year 5 1       249    51.7  16.6     19     93
## 10 NDNS Year 5 2       376    49.4  17.8     19     92
## 11 NDNS Year 6 1       254    51.1  17.7     19     93
## 12 NDNS Year 6 2       409    49.0  18.1     19     95
## 13 NDNS Year 7 1       321    51.5  18.3     19     92
## 14 NDNS Year 7 2       382    49.8  18.3     19     89
## 15 NDNS Year 8 1       270    50.4  16.8     19     90
## 16 NDNS Year 8 2       444    49.9  17.7     19     94

Distribution of the dietary data (by Day of Week)

Day1

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")
with(dta_d1_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          726   11.80        11.80
## Tuesday         847   13.77        25.56
## Wednesday       814   13.23        38.79
## Thursday       1082   17.58        56.38
## Friday         1013   16.46        72.84
## Saturday        848   13.78        86.62
## Sunday          823   13.38       100.00
##   Total        6153  100.00       100.00

Day2

with(dta_d2_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          822   13.36        13.36
## Tuesday         727   11.82        25.17
## Wednesday       848   13.78        38.96
## Thursday        812   13.20        52.15
## Friday         1081   17.57        69.72
## Saturday       1015   16.50        86.22
## Sunday          848   13.78       100.00
##   Total        6153  100.00       100.00

Day3

with(dta_d3_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          846   13.75        13.75
## Tuesday         824   13.40        27.15
## Wednesday       725   11.79        38.94
## Thursday        850   13.82        52.76
## Friday          810   13.17        65.92
## Saturday       1080   17.56        83.48
## Sunday         1016   16.52       100.00
##   Total        6151  100.00       100.00

Day4

with(dta_d4_wide, tab1(DayofWeek, graph = FALSE, decimal = 2))
## DayofWeek : 
##           Frequency Percent Cum. percent
## Monday          994   16.50        16.50
## Tuesday         833   13.82        30.32
## Wednesday       811   13.46        43.78
## Thursday        705   11.70        55.48
## Friday          830   13.77        69.25
## Saturday        792   13.14        82.39
## Sunday         1061   17.61       100.00
##   Total        6026  100.00       100.00

Distribution of the dietary data (by DayNo)

Monday

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSMon_Sun.Rdata")
with(dta_Mon_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             726   21.43        21.43
## 2             822   24.26        45.69
## 3             846   24.97        70.66
## 4             994   29.34       100.00
##   Total      3388  100.00       100.00

Tuesday

with(dta_Tue_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             847   26.21        26.21
## 2             727   22.50        48.72
## 3             824   25.50        74.22
## 4             833   25.78       100.00
##   Total      3231  100.00       100.00

Wednesday

with(dta_Wed_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             814   25.45        25.45
## 2             848   26.52        51.97
## 3             725   22.67        74.64
## 4             811   25.36       100.00
##   Total      3198  100.00       100.00

Thursday

with(dta_Thurs_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1082   31.37        31.37
## 2             812   23.54        54.91
## 3             850   24.64        79.56
## 4             705   20.44       100.00
##   Total      3449  100.00       100.00

Friday

with(dta_Fri_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1            1013   27.13        27.13
## 2            1081   28.95        56.08
## 3             810   21.69        77.77
## 4             830   22.23       100.00
##   Total      3734  100.00       100.00

Saturday

with(dta_Sat_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             848   22.70        22.70
## 2            1015   27.18        49.88
## 3            1080   28.92        78.80
## 4             792   21.20       100.00
##   Total      3735  100.00       100.00

Sunday

with(dta_Sun_wide, tab1(DayNo, graph = FALSE, decimal = 2))
## DayNo : 
##         Frequency Percent Cum. percent
## 1             823   21.96        21.96
## 2             848   22.63        44.58
## 3            1016   27.11        71.69
## 4            1061   28.31       100.00
##   Total      3748  100.00       100.00

The problem of analysing data by day of the week would be that every subject only contributed 2-4 days’ data. Therefore, we cannot see one subject’s dietary change during a Mon-Sun week.

Dietary data by day

rm(list=ls(all=TRUE))
load("~/Documents/LSHTMproject/Rcode/NDNSday1_4.Rdata")

vecid <- unique(dfs3$id)

vecid1<-unique(dta_day1$id) # n = 6153
vecid2<-unique(dta_day2$id) # n = 6153
vecid3<-unique(dta_day3$id) # n = 6151
vecid4<-unique(dta_day4$id) # n = 6026


setdiff(vecid, vecid1) # two subjects did not have day 1 data
## [1] 50506161 70908241
setdiff(vecid, vecid2) # two subjects did not have day 2 data
## [1] 31012251 40714261
setdiff(vecid, vecid3) # four subjects did not have day 3 data
## [1] 10914251 11205071 80702191 81210131
setdiff(vecid, vecid4) # 129 subjects did not have day 4 data
##   [1] 10112011 10701161 10702161 10707261 10906181 10910111 10914251
##   [8] 20106041 20116171 20202081 20205081 20301211 20307041 20405101
##  [15] 20509211 20602011 20615041 21002101 21011041 21107031 21113041
##  [22] 21211041 21211101 30113231 30205131 30205201 30402131 30404081
##  [29] 30411081 30417081 30603071 30605131 30609131 30708201 30709031
##  [36] 30906071 30906201 30907251 30912021 31110201 40101011 40104021
##  [43] 40109221 40116011 40214081 40221221 40315101 40402221 40410251
##  [50] 40504211 40506221 40516021 40710081 40710101 40714251 40714261
##  [57] 40803081 40803221 40808081 40814131 40816011 40902051 40904021
##  [64] 41012081 41016131 41202051 50104191 50105161 50306241 50310271
##  [71] 50501271 50504271 50710161 51002141 51002191 51004011 51102241
##  [78] 51203191 51205141 51208041 51209071 60202081 60202261 60206161
##  [85] 60310131 60313021 60405161 60508071 60606271 60808161 60909271
##  [92] 61013261 61102251 61109081 70113191 70302241 70305031 70309241
##  [99] 70311181 70311251 70407251 70613181 70703181 70714181 70802241
## [106] 70812251 70815241 71101061 71206191 80108061 80301061 80301281
## [113] 80302191 80308251 80312241 80405181 80405281 80410131 80611131
## [120] 80713281 80805191 81002251 81004251 81005191 81007061 81101221
## [127] 81110061 81110131 81203221

Definition of carbohydrates intake groups:

In this analysis, subjects have 2-4 days’ dietary data: their food during the participation were self-recorded. We were able to calculate their energy intake each hour, then we also calculated the percentage of energy coming from carbohydrates that contributed to each hour when there is food consumption recorded. Four categories were identified:

  1. Not eating;
  2. Eating low carbohydrate food (energy contribution less than or equal to 25%);
  3. Eating medium carbohyrate food (energy contribution between 26% and 75%);
  4. Eating high carbohydrate food (energy contribution higher or equal to 75%).

LCA analyses by day

Day 1

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

Model comparison and selection

Model Comparison. (Day 1, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -98828.36 6081 197800.7 198284.9 198056.1 198356.9 91153.99 —
2 -97963.22 6008 196216.4 197191.5 196730.7 197336.5 89423.72 0.788
3 -97484.48 5935 195405.0 196870.9 196178.2 197088.9 88466.24 0.747
4 -97120.79 5862 194823.6 196780.5 195855.8 197071.5 87738.86 0.72
5 -96753.21 5789 194234.4 196682.2 195525.5 197046.2 87003.69 0.679
6 -96474.32 5716 193822.6 196761.3 195372.7 197198.3 86445.91 0.679
7 -96163.55 5643 193347.1 196776.7 195156.0 197286.7 85824.37 0.743
8 -95938.79 5570 193043.6 196964.1 195111.5 197547.1 85374.85 0.741
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 1 data (Fig. 7 classes)

Day 2

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d1_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d1_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d1_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d1_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d1_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d1_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d1_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d1_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.60514 mins

Model comparison and selection

Model Comparison. (Day 2, n = 6153)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -97465.15 6081 195074.3 195558.5 195329.7 195630.5 88514.23 —
2 -96552.25 6008 193394.5 194369.6 193908.8 194514.6 86688.42 0.816
3 -96039.99 5935 192516.0 193982.0 193289.2 194200.0 85663.90 0.737
4 -95629.78 5862 191841.6 193798.5 192873.7 194089.5 84843.50 0.694
5 -95306.17 5789 191340.3 193788.1 192631.4 194152.1 84196.26 0.719
6 -95032.54 5716 190939.1 193877.8 192489.1 194314.8 83649.01 0.683
7 -94784.05 5643 190588.1 194017.7 192397.1 194527.7 83152.03 0.718
8 -94546.10 5570 190258.2 194178.7 192326.1 194761.7 82676.12 0.71
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 2 data (Fig. 7 classes)

Day 3

set.seed(01012)
max_II <- 100000
lc1 <- poLCA(f, data=dta_d3_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2 <- poLCA(f, data=dta_d3_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3 <- poLCA(f, data=dta_d3_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4 <- poLCA(f, data=dta_d3_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5 <- poLCA(f, data=dta_d3_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6 <- poLCA(f, data=dta_d3_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7 <- poLCA(f, data=dta_d3_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8 <- poLCA(f, data=dta_d3_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  31.65244 mins

Model comparison and selection

Model Comparison. (Day 3, n = 6151)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -96133.21 6079 192410.4 192894.6 192665.8 192966.6 86105.74 —
2 -95229.29 6006 190748.6 191723.6 191262.9 191868.6 84297.92 0.807
3 -94708.16 5933 189852.3 191318.2 190625.5 191536.2 83255.65 0.759
4 -94325.31 5860 189232.6 191189.4 190264.7 191480.4 82489.95 0.774
5 -93983.02 5787 188694.0 191141.7 189985.0 191505.7 81805.37 0.722
6 -93654.43 5714 188182.9 191121.4 189732.7 191558.4 81148.19 0.694
7 -93385.94 5641 187791.9 191221.3 189600.7 191731.3 80611.21 0.715
8 -93183.36 5568 187532.7 191453.0 189600.4 192036.0 80206.05 0.747
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 3 data (Fig. 7 classes)

Day 4

set.seed(01012)
max_II <- 100000
lc1<-poLCA(f, data=dta_d4_wide, nclass=1, na.rm = FALSE, nrep=20, maxiter=max_II) #Loglinear independence model.
lc2<-poLCA(f, data=dta_d4_wide, nclass=2, na.rm = FALSE, nrep=20, maxiter=max_II)
lc3<-poLCA(f, data=dta_d4_wide, nclass=3, na.rm = FALSE, nrep=20, maxiter=max_II)
lc4<-poLCA(f, data=dta_d4_wide, nclass=4, na.rm = FALSE, nrep=20, maxiter=max_II) 
lc5<-poLCA(f, data=dta_d4_wide, nclass=5, na.rm = FALSE, nrep=20, maxiter=max_II)
lc6<-poLCA(f, data=dta_d4_wide, nclass=6, na.rm = FALSE, nrep=20, maxiter=max_II)
lc7<-poLCA(f, data=dta_d4_wide, nclass=7, na.rm = FALSE, nrep=20, maxiter=max_II)
lc8<-poLCA(f, data=dta_d4_wide, nclass=8, na.rm = FALSE, nrep=20, maxiter=max_II)

# running time:  27.12666 mins

Model comparison and selection

Model Comparison. (Day 4, n = 6026)
N of Class log-likelihood resid. df AIC BIC aBIC cAIC likelihood-ratio Entropy
1 -92490.81 5954 185125.6 185608.3 185379.5 185680.3 81316.02 —
2 -91693.48 5881 183677.0 184649.0 184188.2 184794.0 79721.37 0.789
3 -91150.49 5808 182737.0 184198.4 183505.7 184416.4 78635.39 0.777
4 -90739.65 5735 182061.3 184012.1 183087.4 184303.1 77813.71 0.717
5 -90396.21 5662 181520.4 183960.6 182803.9 184324.6 77126.84 0.733
6 -90104.35 5589 181082.7 184012.3 182623.6 184449.3 76543.11 0.719
7 -89839.94 5516 180699.9 184118.8 182498.2 184628.8 76014.29 0.625
8 -89634.70 5443 180435.4 184343.7 182491.1 184926.7 75603.81 0.76
Note:
Abbreviation: N, number; resid.df, residual degree of freedom; AIC, Akaike information criterion; BIC, Bayesian information criterion; aBIC, adjusted BIC; cAIC, consistent AIC; Entropy, a pseudo-r-squared index.

Day 4 data (Fig. 8 classes)